Tagging and Glossing Sesotho
نویسندگان
چکیده
This paper describes a system for morphological tagging and gloss-ing of Sesotho, a southern Bantu language. Sesotho has a rich agglu-tinative morphology, and morphemes cannot be disambiguated on the basis of the bigram or trigram statistics that work so well for languages like English. Our system estimates a simple PCFG for Sesotho clauses from a small hand-annotated corpus in an unsupervised manner. It uses this PCFG and a small set of hand-coded constraints to produce a ranked list of possible tags and corresponding glosses for untagged clauses.
منابع مشابه
Automatic interlinear glossing as two-level sequence classification
Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic gloss...
متن کاملThe Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning
In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...
متن کاملEmpirical measurements on a Sesotho tone labeling algorithm
This article discusses the empirical assessments employed on two versions of a Sesotho tone labeling algorithm. This algorithm uses linguistically-defined Sesotho tonal rules to predict the tone labels on the syllables of Sesotho words. The two versions differed in the number of tonal rules that they employ as well the lexical categories that the tone rules apply to. Both versions were tested o...
متن کاملMADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization
We describe the MADA+TOKAN toolkit, a versatile and freely available system that can derive extensive morphological and contextual information from raw Arabic text, and then use this information for a multitude of crucial NLP tasks. Applications include high-accuracy part-of-speech tagging, diacritization, lemmatization, disambiguation, stemming, and glossing. MADA operates by examining a list ...
متن کاملUtility of the Koppitz norms for the Bender Gestalt Test performance of a group of Sesotho-speaking children.
OBJECTIVE This study investigated the utility of the Koppitz administration, scoring and norms for the Bender Gestalt Test (BGT) as a neurocognitive screening instrument for Sesotho-speaking children. METHOD The BGT protocols of 671 Sesotho-speaking children between the ages of seven and nine were reviewed. Data pertaining to socioeconomic status were also gathered for 360 of the participants...
متن کامل